The Plant Phenome Journal — Latest Matching Preprints

1

DeepPheno: A Deep Learning Framework for Linking Hyperspectral Imaging and SNP Genotypes in Lettuce

Okyere, F. G. G.; Mehrem, S. L.; Snoek, B. L.; Van den Ackerveken, G.; Abeln, S.

2026-07-10 plant biology 10.64898/2026.07.09.737449 medRxiv

Top 0.1%

15.1%

Show abstract

While whole genome sequencing captures millions of single nucleotide polymorphisms (SNPs) and hyperspectral imaging (HSI) enables non destructive plant phenotyping, integrating these modalities to link genotype to phenotype remains challenging due to their high dimensionality and non linearity. This study presents DeepPheno a deep learning framework that predicts SNP genotypes from HSI data, using model predictability as a proxy for genotype phenotype association. HSI data were acquired from 194 lettuce genotypes under field conditions. HSI data patches (20 x 20 pixels x 224 spectral bands) were used to train a hybrid CNN to predict the variant of a specific SNP. The framework was validated on SNPs with known phenotypic effects (anthocyanin, leaf serration, pale pigmentation), achieving high predictive performance (AUC ranging from 0.806 to 0.935), whereas models trained on randomly shuffled labels performed at chance (mean AUC {approx} 0.51). Extending the workflow to 50 randomly selected putatively neutral SNPs, most yielded low predictability, but two showed high performance (AUC > 0.76), suggesting uncharacterized genotype phenotype links. Explainable AI, including SHAP and Grad CAM, identified relevant spectral and spatial features driving these predictions, particularly the green and red edge wavelengths associated with pigment dynamics and leaf structure. These results establish a framework for understanding complex genotype phenotype interactions in plants and extracting these links from HSI data without predefining the exact trait values. It provides an avenue for high throughput trait discovery and description and extends the integration of image based phenomics with plant genetics.

2

Leaf movements as a quantitative metric for early stress detection

Herrero, E.; Wijeweera, S.; Gill, A. R.; Bampton, C.; Sullivan, W.; Stamford, J. D.; Bromley, J.; Antoniades, A. Z.; Mortimer, J. C.; Webb, A. A. R.; Gilliham, M.; Millar, A. H.

2026-07-08 plant biology 10.64898/2026.06.16.732190 medRxiv

Top 0.1%

6.0%

Show abstract

Early, precise, and non-destructive stress detection is essential for maintaining crop productivity, particularly in high-density plant growth systems like controlled environment agriculture (CEA), where manual monitoring is often impractical. Using plant motion as a proxy for growth and plant health, we demonstrate a method for early, non-invasive stress detection through quantitative leaf-movement analysis in lettuce and five other CEA relevant crops. Leaf-movement dynamics under stress were imaged with a low-cost, scalable Raspberry Pi imaging setup and quantified using a repurposed open-source motion estimation algorithm; Tracking Rhythms in Plants (TRiP). Our system detected stress-induced changes in leaf-movement within 1 hour of stress, with the timing dependent on the nature of the stress. Sustained reductions in leaf-movement coincide with decreased biomass accumulation. This approach offers a non-invasive, rapid, scalable, and cost-effective solution for continuous crop monitoring, with potential for application in both terrestrial and space farming CEA systems. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=138 SRC="FIGDIR/small/732190v1_ufig1.gif" ALT="Figure 1"> View larger version (54K): org.highwire.dtl.DTLVardef@19ee20eorg.highwire.dtl.DTLVardef@b0804org.highwire.dtl.DTLVardef@3b3fa8org.highwire.dtl.DTLVardef@1d04026_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOGraphical abstract:C_FLOATNO Quantification of leaf-movement dynamics as a high-throughput proxy for plant physiological status, enabling early stress detection and timely intervention to mitigate yield penalties in CEA settings (image made with biorender.org). C_FIG

3

A multiregional image-text dataset and benchmark for vision-language modeling of plant diseases

Nguyen, T. V.; Quoc, K. N.; Harwath, D.; Quach, L.-D.; Dao, P. D.

2026-07-09 plant biology 10.64898/2026.07.01.735881 medRxiv

Top 0.1%

5.6%

Show abstract

Plant diseases remain a major challenge to global food production, and timely, accurate, and scalable detection of plant stress is critical to reducing these losses. Recent advances in digital imaging and artificial intelligence offer unprecedented opportunities for precision crop disease detection and management. Yet, existing plant disease datasets remain often fragmented across crop and disease systems, and are largely dominated by controlled-environment imagery. The lack of standardized, interoperable, and representative datasets limits reproducibility, transferability, and scalability of AI systems, thereby constraining their deployment in operational agricultural applications. Here we present LeafMD, an integrated multimodal plant disease dataset and benchmark resource that includes LeafNet 2.0, a large-scale multimodal digital image dataset comprising 255,855 image-text pairs across 37 crop species, 197 crop-disease classes, and 9 geographic regions spanning tropical, subtropical, and temperate agricultural systems. Unlike conventional datasets, LeafNet 2.0 integrates biologically grounded symptom descriptions with image-level annotations of early and late disease stages, enabling symptom-aware analysis of disease progression under realistic field conditions. We further introduce LeafBench 2.0 as part of LeafMD, a visual-question answering benchmark covering nine fine-grained plant pathology tasks, including pathogen classification, lesion characterization, symptom interpretation, and disease severity assessment. Evaluation across 16 vision-language models revealed substantial performance gaps between coarse disease recognition and fine-grained pathological reasoning, while agriculture-adapted models consistently outperformed several larger general-domain architectures on symptom-oriented tasks. Together, LeafNet 2.0 and LeafBench 2.0 establish LeafMD as a multimodal resource for developing disease-aware agricultural foundation models and studying fine-grained pathological reasoning in real-world environments.

4

Text guidance is powerful but prompt-sensitive for weakly-supervised leaf symptom segmentation

Dubois, R.; Bousset, L.; Jumel, S.; Leclerc, M.; Parisey, N.; Joly, A.

2026-07-10 plant biology 10.64898/2026.07.10.737680 medRxiv

Top 0.1%

4.9%

Show abstract

Accurate segmentation of plant disease symptoms is essential for crop monitoring and phenotyping, yet it typically requires costly pixel-level annotations. Weakly supervised semantic segmentation (WSSS) alleviates this burden using image-level labels, but its performance depends on the quality of spatial priors such as class activation maps (CAMs). We investigate whether text-guided segmentation with the Segment Anything Model 3 (SAM3) can serve as an alternative weak supervision signal. Three pseudo-mask generation strategies are compared: (i) CAMs refined with SAM or SAM3, (ii) zero-shot text-guided SAM3, and (iii) a hybrid approach combining weak spatial cues with text prompts. The resulting pseudo-masks are used to train a DeepLabV3 model. Text guidance alone matches or outperforms conventional WSSS, achieving up to 0.46 IoU without spatial supervision and 0.61 IoU on a public dataset, although performance is sensitive to text prompt formulation. The hybrid strategy improves robustness, reaching 0.50 IoU on the primary dataset and 0.58 IoU on the additional dataset while reducing prompt sensitivity. Overall, text guidance is a promising alternative to conventional weak supervision, while hybrid approaches provide a more robust solution for plant disease segmentation.

5

Rootquant: Automated Root Trait Quantification Fromminirhizotron Images Using Deep Learning

Parth, K.; Varela, S.; Liu, Z.; Martini, K. M.; Rajurkar, A.; Allan, D.; McCoy, S.; Ruhter, J.; Walker, S.; Goldenfeld, N.; Leakey, A.

2026-07-08 plant biology 10.64898/2026.07.07.737053 medRxiv

Top 0.1%

4.4%

Show abstract

Quantifying root traits such as root length (RL) and root surface area (RSA) from minirhizotron imagery is a valuable approach for overcoming the phenotyping bottleneck that limits understanding and improvement of crop productivity, resource use efficiency and resilience in field experiments. However, current approaches remain labor-intensive, and deep learning (DL) methods suffer from limited generalization ability. We present RootQuant, an end-to-end DL model that simultaneously predicts RL and RSA directly from minirhizotron images using only whole-image trait values as supervision, thereby eliminating the need for pixel-level annotations. The models generalization ability was evaluated across species and fine-tuning configurations. The practical applicability of the model was further assessed under field conditions by converting image-derived RL estimates into volumetric root length density (vRLD). Using 118,191 maize and soybean images collected between 2009 and 2020, RootQuant trained on both species achieved an R2 of 0.90 and an RMSE of 2.9 mm for RL, and an R2 of 0.88 and an RMSE of 4.2 mm2 for RSA. The same mixed-species model generalized strongly across species, yielding an 8% relative improvement in R2 and a 30% lower RMSE on maize compared with the same architecture trained on a single species and applied zero-shot. Image-derived RL predictions converted to vRLD showed the expected depth-dependent decline in vRLD, as was also found by coincident destructive quantification of roots washed out of soil cores. By providing a generalist backbone model trained on a large dataset from two major crop species, RootQuant enables high-throughput simultaneous estimation of two relevant root traits directly from raw imagery without task-specific fine-tuning, thereby accelerating in situ root system analysis and phenotyping applications.

6

SeedMeasure: an efficient approach and open-source program to quantify seed size

Sims, B.;Gaudinier, A.;Blackman, B.

2026-06-29 Plant Biology 10.64898/2026.06.27.734974 medRxiv

Top 0.1%

3.3%

Show abstract

PremiseSeed size and morphology are critical traits in agriculture, ecology, and genetics, but high-throughput quantification of these traits is often limited by labor-intensive manual measurements or expensive, platform-specific imaging software. Methods and ResultsWe developed SeedMeasure, a lightweight, open-source, and cross-platform command-line tool written in Python that automates the measurement of seed area, length, and width from images. Using a simple imaging setup, the program processes images by correcting for perspective skew, filtering debris, and exports quantitative data alongside quality-check images. We validated SeedMeasure across nine diverse species, ranging from small Arabidopsis thaliana seeds to large Zea mays kernels. The tool quickly handles images using multithreading and demonstrates high reproducibility, yielding low coefficients of variation across repeated runs. ConclusionsCompared to existing software, SeedMeasure is free, offers faster processing through parallel computing, and provides standalone executables that require no programming dependencies. SeedMeasure offers an accessible, cost-effective, and high-throughput approach for rapid phenotypic profiling, making advanced seed morphological analysis available to researchers without specialized laboratory hardware.

7

Far-red timing uncovers cultivar-dependent yield and bolting responses in vertical-farm spinach (Spinacia oleracea L.)

McGovern, C.; Adrio, M.; Aliki, H.; Vichos, R.; Powell, W.; Sharma, R.

2026-07-13 plant biology 10.64898/2026.07.10.737849 medRxiv

Top 0.1%

2.2%

Show abstract

Far-red light (FR; 700-750 nm) is increasingly incorporated into controlled-environment lighting because it can improve photosynthetic efficiency when combined with comparatively shorter wavelengths. In long-day leafy crops such as spinach, however, FR may also promote the transition from vegetative to reproductive growth and thereby reduce marketable yield. Most studies have evaluated FR fraction, intensity or end-of-day exposure, whereas the developmental timing of FR has rarely been tested, particularly in spinach. Here, we evaluated six commercial spinach cultivars (Amador, Harp, Renegade, Responder, Rubino and Santa Cruz) in an indoor vertical farm under a common red-green-blue background (PPFD 260-264 {micro}mol m-{superscript 2} s-{superscript 1}, 12 h photoperiod, 24 {degrees}C) and four FR timing treatments: no FR (Control), FR throughout production (FullFR), FR during early development only (EarlyFR), and FR during late development only (LateFR). LateFR increased marketable fresh weight relative to Control (244 vs 224 g) and reduced flowering incidence, whereas far-red supplied during early development reduced fresh weight (158 g) and increased flowering. The magnitude of the timing response differed among cultivars: switching from EarlyFR to LateFR recovered 0 % fresh weight in Amador but 107 % in Renegade and Rubino, with the largest penalties occurring in otherwise bolt-resistant cultivars. EarlyFR also increased total chlorophyll and reduced the chlorophyll a:b ratio. These results show that FR response in spinach is strongly conditioned by developmental stage and cultivar. Although LateFR received more total far-red than EarlyFR, it behaved like the Control, indicating that the penalty was set by far-red timing rather than dose. Treatment differences in bolting and yield tracked an estimated phytochrome photostationary-state deficit during early development: a phytochrome-deficit model markedly outperformed a cumulative-dose model ({Delta}AIC = 441), and the deficit x cultivar interaction was strong (p < 0.001), with bolt-resistant cultivars losing most yield when far-red coincided with the early developmental window. We therefore propose that FR should be treated as a genotype-dependent management variable rather than as a fixed spectral input, with late application and bolt-resistant cultivars offering the most favourable combination for vertical-farm spinach production. Framed within the breeders equation, the close match between the trial and production environment and the scope for shorter breeding cycles indoors suggest that genotype and far-red timing can be optimised jointly to accelerate genetic gain.

8

Knowledge-guided Bayesian optimization using pre-trained LLMs speeds up the identification of superior genotypes from germplasm collection

Hamazaki, K.; Tsuda, K.

2026-07-02 bioinformatics 10.64898/2026.06.28.735149 medRxiv

Top 0.1%

2.1%

Show abstract

Background: Germplasm collections contain wide genetic diversity that is valuable for plant breeding, but conducting phenotypic evaluation for all genotypes in field trials is rarely feasible. Bayesian optimization offers a way to decide, season by season, which genotypes to cultivate in order to identify superior genotypes with fewer evaluations. However, standard Bayesian optimization commonly starts from randomly selected genotypes and mainly relies on surrogate models built from marker genotype information, while the text-based passport information that accompanies germplasm is not fully used. We examined whether pre-trained large language models can provide prior knowledge that improves these decisions in germplasm evaluation. Results: We constructed a large-language-model-guided Bayesian optimization framework that introduces large language models into two parts of the Bayesian optimization workflow. In zero-shot warmstarting, a large language model proposes initial genotypes using passport information such as cultivar name, country of origin, and subpopulation, optionally together with principal component scores derived from genome-wide single-nucleotide-polymorphism markers. In addition, we evaluated a large-language-model-based surrogate model that predicts phenotypic values for untested genotypes using in-context learning from previously evaluated genotypes. Using a rice germplasm panel and two target traits (seed number per panicle for maximization and protein content for minimization), we compared strategies. For seed number per panicle, zero-shot warmstarting with a general-purpose instruction-following model reduced the number of evaluated genotypes needed to reach the best genotype, whereas improvements were small for protein content. When genomic information was available, Gaussian-process-based Bayesian optimization was the strongest overall approach, while the large-language-model-based surrogate model outperformed random baselines and was competitive in some settings. When genomic information was not available, predictions based on passport information improved efficiency compared with fully random strategies. Conclusions: Pre-trained large language models can inject useful agronomic knowledge into Bayesian optimization for germplasm evaluation, particularly by improving early-stage genotype selection, and can also support optimization when genomic information is unavailable. As models better handle long genomic sequences together with passport information, large-language-model-guided Bayesian optimization may become a practical and explainable decision-support approach for agricultural optimization.

9

High-throughput stomatal phenotyping provides selection targets for stress-resilient wheat

Mabrouk, M.; Russell, N. J.; Alegria, E. V.; Wang, T.-C.; Liang, J.-A.; Wu, F.-J.; Huang, Y.; Wittkop, B.; Snowdon, R.; Förter, L.; Moritz, A.; Herzog, E.; Ganji, E.; Wehner, G.; Stahl, A.; Chen, T.-W.

2026-07-13 plant biology 10.64898/2026.07.10.737162 medRxiv

Top 0.1%

1.7%

Show abstract

Phenotyping stomatal traits and their developmental plasticity is time-consuming but holds potential to improve water use efficiency and photosynthesis for designing stress-tolerant crops under climate change. Here, we develop a robust, high-throughput pipeline for phenotyping 14 stomatal traits in winter wheat related to size, variation, maximum conductance, and spatial patterning. We (1) analyze over 25,000 images from 60 wheat cultivars grown in growth chamber, greenhouse, and field conditions; (2) investigate the impact of light, temperature, and reduced water and nitrogen supply on stomatal traits and their developmental plasticity across adaxial and abaxial surfaces; and (3) evaluate genetic diversity and breeding progress of stomatal traits. Stomatal traits were highly broad-sense heritable, were largely plastic in response to environmental conditions, and showed genotype-specific responses. Stomatal traits of third leaves under controlled environments with stable light and temperature conditions reliably captured the genetic variance of flag leaves under field conditions. Our data suggests that the upper leaf surface contributed more to transpiration and cooling through consistently higher stomatal density, area, and maximum conductance, while the lower surface facilitated CO2 diffusion via systematic proper patterning and spacing. Breeding maintains the genetic diversity of stomatal traits, and our pipeline facilitates breeders to target them to enhance water use efficiency in high-yielding modern cultivars.

10

PolliCrop: A high-throughput computer vision pipeline for pollinator monitoring in agroecosystems

Chabert, S.; Bernigaud-Samatan, J.; Blackman, B. K.; Blanchet, N.; Catrice, O.; Donnadieu, C.; Gani, M.; Grousset, R.; Husband, S.; Tueux, G.; Erler, S.; Langlade, N. B.

2026-07-13 animal behavior and cognition 10.64898/2026.07.08.737348 medRxiv

Top 0.2%

0.9%

Show abstract

Flower-visiting insect populations are declining since the 1990s, especially because of the decrease of floral resources in agricultural settings. Mass flowering crops can help increase resource availability, and plant breeding can be directed towards selecting varieties attracting more flower-visiting insects. This requires the implementation of an automated high-throughput phenotyping tool for assessing the attractiveness of plant genotypes to flower-visiting insects. In this study, (i) we present a procedure to take standardized images of sunflower heads with camera traps continuously at day and night in the field; (ii) we trained two versions of a deep learning model, named PolliCrop, to automatically detect and identify three classes of the main insects visiting sunflower on these images (non-Bombus bees, bumble bees, lepidopterans); (iii) we assessed and validated the ability of PolliCrop to correctly predict the true visitation frequencies of the insect classes on three sunflower genotypes; (iv) we presented two statistical approaches to compare the insect visitation frequencies between plant genotypes, one including weather variables, and the other one without. One PolliCrop version yielded satisfying performance to correctly detect the three insect classes. In particular, it correctly predicted the insect visitation frequencies on two sunflower genotypes in a range of {+/-}10%. The other PolliCrop version can be useful in certain contexts of images and objectives. PolliCrop can be extended in the future to other crop species by training PolliCrop on new images captured in these crops. The field experimental design to set up for comparing the attractiveness between genotypes is also discussed.

11

VigExp: A functionally verified platform for aiding cowpea (Vigna unguiculata) and related legume crop improvement

Su, H.; Mazurkiewicz, D.; Gursanscky, N.; Riboni, M.; Juranic, M.; Johnson, S. D.; Yow, J. H.; Deo, J.; Liu, Y.; Mattinson, A.; Leon-Martinez, G.; Escobar-Guzman, R.; Salinas-Gamboa, R.; Amasende-Morales, I.; Vielle-Calzada, J.-P.; Koltunow, A. M. G.; Ferguson, B. J.

2026-07-09 plant biology 10.64898/2026.06.30.735734 medRxiv

Top 0.2%

0.6%

Show abstract

Legumes include some of the worlds most significant crop species, such as cowpea (Vigna unguiculata), a subsistence crop widely grown in sub-Saharan Africa. Despite their importance, legume crop improvement is hindered by a lack of high-resolution expression data, particularly for reproductive tissues and cell types. Here, we report on VigExp, a tool for visualising cowpea gene expression datasets. We demonstrate its utility across a range of vegetative and reproductive cell types of varieties IT97K-499-35 and IT86D-1010, which exhibit 93.75% protein sequence conservation and are amenable to stable transformation. This includes previously published transcriptomes of vegetative, floral and seed tissues, combined with developmentally staged male and female reproductive tissues. Also integrated are novel transcriptomes of laser-captured cell types covering reproductive development from meiosis to early embryo formation post-fertilisation. Spatial expression patterns and transcript levels can be visualised through an electronic fluorescent pictograph (eFP) browser. Validated by RT-qPCR, in situ hybridisation, transgenic, and CRISPR gene editing analyses, the predictive accuracy of VigExp matches prior cowpea functional study observations. Critical genes for nodule development and regulation were also identified and their expression patterns established in cowpea. Novel reference genes, constitutively expressed gene promoters for visualization makers/gene-editing, and tissue and cell specific gene promoters for targeting these regions, are identified. The A-type cyclin, VuTAM2, was also identified, with a critical role in male meiosis established. Collectively, VigExp represents an adaptable and updatable resource to support crop improvement in cowpea and other legumes, which are often highly syntenic with respect to genome composition.

12

From Phenomics to Genomics: Macro-GWAS of Almond Morphology and Quality

Mas Gomez, J.; Rubio Angulo, M.; Duval, H.; Dicenta, F.; Martinez-Garcia, P. J.

2026-07-07 plant biology 10.64898/2026.07.06.736816 medRxiv

Top 0.3%

0.5%

Show abstract

In plant breeding and genetics, recent advances in high-throughput phenotyping are beginning to meet the growing demand for large-scale, high-quality phenotypic data that emerged after the development of next-generation sequencing technologies. Recent developments in phenomics have been incorporated into almond breeding programs, facilitating the large-scale acquisition of quantitative phenotypes and the dissection of the genetic architecture underlying morphological and quality-related traits. The implementation of a high-throughput phenotyping platform integrating RGB and hyperspectral imaging with genotyping using the 60K almond SNP array enabled the large-scale characterization of almond populations and the identification of 567 robust marker-trait associations across 66 traits. These analyses revealed two major genomic hotspots on chromosomes 2 and 5 associated with morphological and quality-related traits. These regions harbored biologically relevant candidate genes, including genes associated with OVATE family proteins, brassinosteroid signaling, protein ubiquitination, and acyl-CoA metabolism, as well as other regulators of organ growth, cell proliferation, hormone signaling, and seed development. Furthermore, a novel candidate gene encoding a COMT-like O-methyltransferase involved in lignin biosynthesis was identified and proposed to contribute to shell hardness, a major genetically controlled trait in almond. Together, these findings demonstrate the potential of integrating high-throughput phenomics and genomics to dissect complex traits, identify candidate genes, and accelerate genomics-informed breeding in almond.

13

Comparison of localGEBV and Optimal Haplotype Stacking Fitness Functions using a Novel R Package: HapSelect

Shaffer, W.; Papin, V.; Carter, Z.; Brunner, S. M.; Tong, J.; Villiers, K.; Robinson, H.; Voss-Fels, K.; Hayes, B. J.; Hickey, L.; Dinglasan, E.

2026-07-13 genetics 10.64898/2026.07.08.737160 medRxiv

Top 0.3%

0.4%

Show abstract

Haplotype-based breeding strategies have emerged as promising approaches to maximize long-term genetic gain by identifying complementary parental combinations while maintaining genetic diversity. However, these methods typically require phased genotypes and more intensive workflow pipelines and skillsets. We developed a novel local genomic estimated breeding value (localGEBV) fitness function with similar intent to the optimal haplotype stacking (OHS) framework fitness function and implemented both in the novel R package, HapSelect. Our aim was to evaluate whether phased haplotypes provide additional benefit over the more easily available dosage-based unphased genotypes in highly inbred crops. A subset of bread wheat nested association mapping (NAM) population comprising 444 lines genotyped with 6,054 DArT-Seq markers was analysed. Marker effects were estimated using rrBLUP, localGEBV and haplotype effects were calculated across linkage disequilibrium-defined haploblocks, and genetic algorithms (GA) were used to identify optimal sets of 30 founders using either a localGEBV derived fitness function with unphased, dosage inputs or the OHS fitness function with phased inputs. Selected parental sets were compared with conventional truncation selection (TS) through 150 generations of forward simulation. The OHS fitness function achieved a marginally greater optimized ultimate GEBV than the localGEBV fitness function during GA optimization, with only 18 of the 30 selected founders overlapped between the two methods. Despite these differences, forward simulations demonstrated nearly identical long-term genetic gain for localGEBV and OHS-selected founders, with both approaches outperforming conventional truncation selection by maintaining greater genetic diversity and delaying the genetic plateau. The minimal difference between localGEBV and OHS is likely attributable to the high homozygosity of the population, where localGEBV and haplotype effects are nearly confounded. These results demonstrate that dosage-based localGEBV provides a practical alternative to phased haplotype approaches for parent selection in inbred crops, substantially simplifying genomic workflows while maintaining long-term breeding performance. Future work should evaluate these methods in more diverse inbred populations and outbred species, where great haplotypic diversity may increase the advantage of true haplotype-based optimizations.

14

Enhancing predictive accuracy of yield traits in cassava through multi-trait genomic prediction

de Freitas, G. M.; Certuche, D. S.; Jannink, J.-L.; de Oliveira, E. J.; Garcia, A. A. F.

2026-07-06 genetics 10.64898/2026.07.01.735838 medRxiv

Top 0.3%

0.4%

Show abstract

Multi-trait genomic prediction offers a practical route to improve selection for costly, complex traits in clonally propagated crops such as cassava. In a Brazilian breeding panel of 1,078 cassava clones genotyped with 25,923 SNPs and phenotyped for six agronomic traits, we compared single-trait (ST) and multi-trait (MT) GBLUP models. Stage-wise mixed models produced BLUEs that fed into ST and MT-GBLUP. We tested five cross-validation schemes that mimic breeder realities: ST baseline (CV1); naive all-traits MT prediction for unphenotyped candidates (CV2); MT prediction using auxiliary trait phenotypes in the test set (CV3); and two sparse-phenotyping regimes with missingness by trait (CV4) or by clone (CV5) at 25%, 50%, and 75% levels. The main results were that, under the ST baseline (CV1), predictive ability ranged from 0.50 for DMC and 0.45 for FRY down to 0.13 for Le.Dis. A naive full MT model (CV2) performed approximately on par with ST-GBLUP. In contrast, MT designs (CV3) that included informative auxiliary traits, such as shoot yield and combinations with plant vigor and leaf disease severity, yielded small gains for DMC with predictive ability of approximately 0.51 (+2%), while FRY predictive ability increased to approximately 0.65 (+44%), accompanied by RMSE reductions for FRY up to approximately 13.5% (e.g. RMSE approximately 6.2). Sparse-phenotyping simulations (CV4/CV5) demonstrated that MT models sustain or even improve predictive ability under realistic missing-data regimes (PA {approx} 0.62 - 0.65). Selection concordance between MT and ST top-10% sets was generally high (>0.80), and MT configurations produced measurable improvements in expected selection response and genetic gain per cycle for several target traits. These results indicate that strategically implemented MT-GBLUP, using a small set of biologically and operationally informative auxiliary traits and optimized sparse phenotyping, can materially increase predictive accuracy and selection efciency for economically critical cassava traits while reducing phenotyping burden.

15

Diversity Assessment with SNP, SSR, AFLP, and RAPD Markers in Plants: A Systematic Review and Meta-Analysis

Olagunju, Y. O.; Olawuyi, O. J.

2026-07-07 plant biology 10.64898/2026.07.03.736291 medRxiv

Top 0.3%

0.4%

Show abstract

Background. DNA-based molecular markers underpin plant genetic diversity assessment, germplasm characterisation, and conservation prioritisation. Four marker systems dominate the field: Amplified Fragment Length polymorphisms (AFLPs), simple sequence repeats (SSRs), single nucleotide polymorphisms (SNPs), and random amplified polymorphic DNA (RAPDs). No quantitative meta-analysis had pooled their performance on the canonical diversity metrics: polymorphism information content (PIC), expected heterozygosity (He), and resolution power, across plants. Existing reviews are narrative, marker-restricted, or qualitatively conclusive of infeasibility. Methods. A PRISMA 2020-compliant systematic review (registered at the Open Science Framework) was executed. Eligible studies were within-study paired comparisons genotyping the same accession panel with at least two of {SNP, SSR, AFLP, RAPD} and reporting at least one diversity metric. Effect sizes were paired standardised mean differences (Hedges' g) computed under the Bernoulli-variance approximation. Random-effects REML meta-analysis used metafor 5.0.1 with Knapp-Hartung adjustment, leave-one-out, and r-sensitivity. Results. Fifteen within-study paired contrasts were eligible, distributed across three pools. Pool 2 (SSR vs SNP, He, k = 5) yielded a pooled Hedges' g of 0.494 (95% CI: -0.078 to 1.066, p = 0.075; I-squared = 90.2%; 95% PI [-0.82, 1.81]). SSRs exceeded SNPs on He in 4 of 5 studies; leave-one-out removal of the panel-size-asymmetric outlier raised the estimate to g = 0.644 (p = 0.025). Pool 3a (dominant-marker stratum, k = 6) yielded g = 0.419 (95% CI: -0.121 to 0.960, p = 0.103; I-squared = 56.5%); five of six contrasts showed SSR or AFLP exceeding RAPD on per-locus PIC. Pool 1 (PIC, k = 3, exploratory) gave a consistent direction (g = 0.453). All three pools point in the same direction: codominant or AFLP markers carry more per-locus information than the alternative being compared. Conclusions. SSR markers reported higher per-locus diversity than SNP and RAPD markers in plant within-study paired comparisons, mechanistically grounded in the SNP biallelic ceiling and the multi-allelic richness of SSRs. The effect attenuated or reversed in selfing/low-diversity panels and at the per-panel level when SNP panels exceeded approximately 1000 loci. RAPDs show the lowest per-locus information content of the four classes.

16

GuavaVision AI: An Explainable Deep Learning Framework for Automated Classification, Lesion Localization, and Segmentation of Guava Diseases

Biswas, J.; Islam, M.; Bangabashi, M. M.; Akter, M.; Nishi, T. S.; Sheikh, M. K.; Mia, M. R.; Anwar, M. M.

2026-06-23 bioengineering 10.64898/2026.06.18.733093 medRxiv

Top 0.3%

0.4%

Show abstract

Guava cultivation is considerably influenced by foliar and fruit diseases whose overlapping symptoms and environmental variability make accurate field-level diagnosis challenging. Numerous studies have been conducted to find efficient methods of diagnosing plant diseases, but most focus on image-level classification and do not include lesion localization or pixel-level segmentation of the images within a single framework of analysis. This study proposes a comprehensive framework for utilizing automated image analysis to classify guava leaf and fruit diseases at the image level, locate lesions, and segment lesions at the pixel level from multiple images of the same type of disease collected from various growing conditions. The dataset was enriched through three augmentation strategies including standard preprocessing, structured augmentation, and GAN-based synthetic image generation, expanding the effective training data to approximately 7,000 images, while a 5-fold cross-validation strategy guided model selection and final performance was assessed on a held-out test set. The experimental evaluation of multiple state-of-the-art Convolutional Neural Networks (CNNs) for the classification of guava leaf and fruit diseases indicated that the model generated using the ResNet50+DenseNet121 model fusion achieved the highest classification accuracy of 98.20%. For lesion detection and segmentation, YOLOv8-seg outperformed Mask R-CNN, achieving mAP@0.5 of 0.907 and 0.889, and mAP@0.5:0.95 of 0.783 and 0.769 for detection and segmentation, respectively, with a balanced precision-recall profile. The techniques of Explainable AI (XAI) were used to increase the transparency of this model by identifying areas in the image that are significant to the actual lesion. The framework was further designed with practical web-based deployment in mind, evaluating both lightweight and high-capacity models to balance computational efficiency against predictive accuracy. From this research, it was concluded that using model fusion, data augmentation, and segmentation-aware lesion detection would provide a solution for managing guava diseases effectively.

17

Multi-trait evaluation of a tomato MAGIC population identifies promising lines with improved nitrogen use efficiency (NUE)

Baraja-Fonseca, V.; Gil-Villar, D.; Bancic, J.; Renau-Morata, B.; Salud Justamante, M.; Plazas, M.; Gramazio, P.; Vilanova, S.; Perez-Perez, J. M.; Granell, A.; Molina, R. V.; Nebauer, S. G.; Prohens, J.; Arrones, A.

2026-07-15 plant biology 10.64898/2026.07.14.738388 medRxiv

Top 0.3%

0.4%

Show abstract

Nitrogen-use efficiency (NUE) is a pivotal breeding target in tomato (Solanum lycopersicum L.) to sustain production under reduced N inputs. Here, we leveraged a recently developed tomato multi-parent advanced generation inter-cross (ToMAGIC) population to identify lines with superior performance under reduced N availability. The eight founders and a core subset of 118 ToMAGIC lines were characterized with 10,684 SNP markers and evaluated under optimal (opN, 15 mM) and suboptimal (subN, 8 mM) N supply in an experiment totalling 1,576 plants, generating 48,068 data points across 61 phenotypic variables. Under both N treatments, ToMAGIC lines exhibited transgressive segregation for most traits, confirming the value of this population as a reservoir of untapped variation. Notably, under subN conditions, harvest index (Hi) increased by 29-44%, suggesting adaptive resource redistribution toward reproductive sinks. Variance partitioning revealed that agronomic and NUE-related traits were largely under genetic control, with heritability estimates frequently above 0.80 and broadly conserved across N treatments. Multivariate trait analysis identified fruit yield N concentration (NUE component, CN,y), shoot biomass N content (NAb), and shoot growth-related traits as the main drivers of treatment differentiation. Finally, proxy traits were prioritized by integrating response magnitude, heritability, trait correlations, and treatment-discriminatory power into multi-trait selection indices. This strategy generated favorable predicted genetic gains, reaching 158% for high-performance lines and 170% for subN-adapted lines, and consistently identified lines 402, 428, 518, 800, and 816 as promising pre-breeding materials. Overall, this study supports ToMAGIC as a powerful resource for developing N-efficient cultivars suited for sustainable agriculture.

18

Apical3DTip: Elliptic Cross-section-based Reconstruction for the Embryo Initial Cell of Arabidopsis

Nonoyama, T.; Kang, Z.; Hanaki, Y.; Itagaki, Y.; Matsumoto, H.; Kimata, Y.; Tsugawa, S.; Ueda, M.

2026-07-09 plant biology 10.64898/2026.06.25.734685 medRxiv

Top 0.4%

0.3%

Show abstract

BackgroundCell geometry plays a central role in determining division orientation and body axis formation during early embryogenesis in Arabidopsis thaliana. However, quantitative analysis of dynamic three-dimensional (3D) morphology remains challenging because live-imaging studies often rely on two-dimensional (2D) projections, while existing 3D reconstruction approaches, including mesh-based methods, often lose the original orientation information relative to the ovule and require labor-intensive mesh correction. In addition, embryo positional fluctuation caused by floating in liquid medium and continuous growth makes it difficult to analyze temporal morphological changes within a common coordinate system. ResultsWe developed a robust framework for quantitative 3D and four-dimensional (4D; 3D + time) analysis of embryo initial cell (apical cell) morphology. The method first establishes a standardized 3D coordinate system by normalizing cell orientation based on the bottom plane and the optical axis of the observation. Cell morphology is then reconstructed through ellipse-based approximation of serial cross-sections extracted from stacked imaging data, enabling accurate geometric characterization without the need for complex surface mesh reconstruction. To evaluate shape anisotropy, we quantified the apical cell shape in 3D. The framework further supports the characterization of volumetric features of subsequent division, providing a basis for quantifying 3D embryogenesis. ConclusionOur framework provides a simple and noise-reduced approach for quantitative analysis of living cell morphology in 3D. We named the integrated method of combining coordinate normalization with elliptical cross-section-based reconstruction Apical3DTip. This method enables consistent comparison of cell shapes without extensive manual corrections. The method overcomes key limitations of 2D projection-based and mesh-dependent analyses and offers a practical platform for quantifying cell shape and daughter cell shapes in 3D. More broadly, it provides a quantitative foundation for exploring the relationship between cell geometry, morphodynamics, and developmental patterning in living plant embryos.

19

A genetic toolkit to reduce wheat immunogenicity and incidence of celiac disease

Rottersman, M. G.; Laudencia-Chingcuanco, D.; Zhang, W.; Guzman-Lopez, M. H.; Lin, J. W.; Zhang, J.; Caseys, C.; Burguener, G.; Kim, S.; Zhang, X.; Yunusbaev, U.; Akhunov, E.; Lee, J.-Y.; Dubcovsky, J.

2026-07-08 plant biology 10.64898/2026.06.23.734071 medRxiv

Top 0.4%

0.3%

Show abstract

Celiac disease (CeD) is an immune-mediated condition triggered by wheat gluten in genetically predisposed individuals. The immune reaction in people with CeD is driven by particular gluten amino acid sequences, or immunogenic epitopes. Some of these epitopes elicit strong immune responses in the majority of CeD patients and are designated as immunodominant epitopes. Previous research has shown correlations between the amount of immunogenic wheat epitopes consumed and the onset of CeD, suggesting that reducing wheat immunogenic epitopes may reduce CeD incidence at the population level. Gluten consists of gliadins and glutenins, with gliadins having the majority of the immunodominant epitopes and glutenins playing a major role in dough strength and breadmaking quality (BMQ). This study used radiation-induced deletions, chemical mutagenesis, and natural variation in wheat (Triticum aestivum) to generate genetic stocks with reduced immunogenic epitope content. Most lines were developed in the wheat cultivar Summit, for which we produced a full genome assembly and annotation. We used exome capture to characterize these deletions and identify prolamins located within and outside the deletions. We combined different deletions and developed molecular markers to facilitate their deployment. For chromosome arms 1BS and 1DS, we generated two alternative lines: one lacking immunogenic epitopes for the development of CeD-safe genetic stocks for research purposes, and another retaining selected glutenins for breeding commercial lines with reduced immunogenicity and adequate BMQ. By making these non-transgenic genetic stocks publicly available, we aim to accelerate the development of wheat varieties with reduced immunogenicity and, eventually, a fully CeD-safe wheat.

20

Sunrise and sunset times are the main factors that determine the flowering time of photoperiod-sensitive sorghum

Clerget, B.; Sidibe, M.; vom Brocke, K.; Raharinivo, V.; Ortiz, D.; Trouche, G.

2026-07-08 plant biology 10.64898/2026.06.12.731875 medRxiv

Top 0.6%

0.1%

Show abstract

Crop photoperiodism models assume that flowering time is primarily controlled by daylength, yet many field observations contradict this view. We previously proposed an alternative framework integrating daily changes in sunrise and sunset times (dSR and dSS). Variety trials in Madagascar and in Argentina supported this concept: mid-late sorghum varieties from the northern hemisphere flowered late or very late when sown in November and December, consistent with the higher dSR/dSS values of the southern hemisphere summer. One Malian variety, sown monthly over six years in West Africa, exhibited high interannual variability in flowering time when sown between November and February. This revealed that up to four photoperiodic responses -- two quantitative and two qualitative, occurring at different times of the year -- may coexist within a single late photoperiod sensitive variety. All responses use only dSR and dSS cues. The qualitative responses are triggered by an internal phasic coincidence, which is set by a linear relationship between dSR and dSS at the onset of plant photoperiod sensitivity, and between dSR+dSS at panicle initiation. The research model fitted data from 28 varieties grown in Mali well. It also accurately fitted the duration to PI observed in three varieties sown at tropical and temperate latitudes. HighlightThe seasonal photoperiodic adaptation of flowering time in sorghum plants may rely on several signal transduction pathways regulated by sunrise and sunset times rather than day length.